Latent-Variable Modeling of String Transductions with Finite-State Methods

نویسندگان

  • Markus Dreyer
  • Jason Smith
  • Jason Eisner
چکیده

String-to-string transduction is a central problem in computational linguistics and natural language processing. It occurs in tasks as diverse as name transliteration, spelling correction, pronunciation modeling and inflectional morphology. We present a conditional loglinear model for string-to-string transduction, which employs overlapping features over latent alignment sequences, and which learns latent classes and latent string pair regions from incomplete training data. We evaluate our approach on morphological tasks and demonstrate that latent variables can dramatically improve results, even when trained on small data sets. On the task of generating morphological forms, we outperform a baseline method reducing the error rate by up to 48%. On a lemmatization task, we reduce the error rates in Wicentowski (2002) by 38–92%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Expressiveness of streaming string transducers

Streaming string transducers [1] define (partial) functions from input strings to output strings. A streaming string transducer makes a single pass through the input string and uses a finite set of variables that range over strings from the output alphabet. At every step, the transducer processes an input symbol, and updates all the variables in parallel using assignments whose right-hand-sides...

متن کامل

Finitary Compositions of Two-way Finite-State Transductions

The hierarchy of arbitrary compositions of two-way nondeterministic finite-state transductions collapses when restricted to finitary transductions, i.e., transductions that produce a finite set of outputs for each input. The hierarchy collapses to the class of nondeterministic MSO definable transductions, which is inside the second level of that hierarchy. It is decidable whether a composition ...

متن کامل

Internship report - Streaming String Transducers

In formal language theory, two very different models sometimes turn out to describe the same class of languages. This usually shows that there is a fundamental concept described by those models. A well-known example is the class of regular languages, which can be characterized by logic (monadic second order (MSO) logic), algebra (syntactic monoids), and many computational models (automata). In ...

متن کامل

30 th International Conference on Foundations of Software

Streaming string transducers [1] define (partial) functions from input strings to output strings.A streaming string transducer makes a single pass through the input string and uses a finiteset of variables that range over strings from the output alphabet. At every step, the transducerprocesses an input symbol, and updates all the variables in parallel using assignments whoserigh...

متن کامل

Linear Transduction Grammars and Zipper Finite-State Transducers

We examine how the recently explored class of linear transductions relates to finite-state models. Linear transductions have been neglected historically, but gainined recent interest in statistical machine translation modeling, due to empirical studies demonstrating that their attractive balance of generative capacity and complexity characteristics lead to improved accuracy and speed in learnin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008